Inversion of F0 model for natural-sounding speech synthesis
نویسندگان
چکیده
Natural-sounding speech synthesizers requires the information from a model quantitatively describing prosody. Fujisaki’s model [1] has shown considerable accuracy on many languages [4][6]. We propose a method for Fujisaki’s model parameters estimation, i.e. an inversion methods, based on relative extremes of pitch contour and a gradient algorithm refinement procedure. Preliminary results show excellent performance of the proposed method in matching the pitch contours. Preliminary results of synthesis making use of obtained features are surely encouraging.
منابع مشابه
A new model of intonation for use with speech synthesis and recognition
This paper describes a synthesis from analysis scheme for producing natural sounding intonation for speech synthesis. The paper presents a new method of describing F0 contours in terms of three basic phonetic intonation elements. Details are given of an automatic system for labelling F0 contours, which could be used for speech recognition purposes. Current work on extracting a phonological desc...
متن کاملPerceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech
Recent studies have shown that the Tonal Center of Gravity is a better classifier than F0 Turning Points for at least two contrastively timed pitch accents in American English intonation contours. Within this framework, a binary F0 weighting function derived from the F0 contour can be used instead of the natural F0 contour without a degradation in discrimination performance. This success has im...
متن کاملApplying a Hybrid into Seamless Speech
We present a speech synthesizer to seamlessly concatenate recorded and synthetic phrases to produce natural sounding and highly expressive speech. Not only the acoustic units, but also the F0 contours are seamlessly concatenated together from recorded and synthetic phrases. When mixed with recorded phrases, the F0 contours of synthetic phrases are generated adaptively relative to the actual sur...
متن کاملApplying a Hybrid Inton Seamless Speech S
We present a speech synthesizer to seamlessly concatenate recorded and synthetic phrases to produce natural sounding and highly expressive speech. Not only the acoustic units, but also the F0 contours are seamlessly concatenated together from recorded and synthetic phrases. When mixed with recorded phrases, the F0 contours of synthetic phrases are generated adaptively relative to the actual sur...
متن کاملA multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls
In singing voice, the fundamental frequency (F0) carries not only melody, but also music style, personal expressivity and other characteristics specific to voice production mechanism. The F0 modeling is therefore critical for a natural-sounding and expressive synthesis. In addition, for artistic purposes, composers also need to have control over expressive parameters of the F0 curve, which is m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003